Distance, dissimilarity index, and network community structure.
نویسنده
چکیده
We address the question of finding the community structure of a complex network. In an earlier effort [H. Zhou, Phys. Rev. E 67, 041908 (2003)], the concept of network random walking is introduced and a distance measure defined. Here we calculate, based on this distance measure, the dissimilarity index between nearest-neighboring vertices of a network and design an algorithm to partition these vertices into communities that are hierarchically organized. Each community is characterized by an upper and a lower dissimilarity threshold. The algorithm is applied to several artificial and real-world networks, and excellent results are obtained. In the case of artificially generated random modular networks, this method outperforms the algorithm based on the concept of edge betweenness centrality. For yeast's protein-protein interaction network, we are able to identify many clusters that have well defined biological functions.
منابع مشابه
یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملDetecting Community Structure in Complex Networks Using Bacterial Chemotaxis with Fuzzy C-means Clustering
Identification of (overlapping) communities/clusters in a complex network is a general problem in data mining of network data sets. In this paper, the bacterial chemotaxis (BC) strategy is used to maximize the modularity of a network, associating with a dissimilarity-index-based and with a diffusion-distance-based fuzzy c-means clustering iterative procedure. The proposed algorithm outperforms ...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملThe yeast protein-protein interaction map is a highly modular network with a staircase community structure
Summary: The construction of genome wide protein–protein interaction maps makes it feasible to study the global organization of proteins in a biological cell. Here the module organization of the protein–protein interaction network (PPIN) of budding yeast are investigated by Netwalk, an algorithm based on biased random (Brownian) walks. The yeast PPIN is a highly modular network, it has a modula...
متن کاملEcological Dissimilarity Analysis: A Simple Method of Demonstrating Community-Habitat Correlations for Frequency Data
We introduce an analysis method to demonstrate correlation between biota and the physical habitats that they occupy. Using the same calculations as does Nei’s genetic distance index, this method builds independent dissimilarity matrices for both habitat and fauna, which can then be compared in a common statistical framework. An important advantage of this method is that only frequency data are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Physical review. E, Statistical, nonlinear, and soft matter physics
دوره 67 6 Pt 1 شماره
صفحات -
تاریخ انتشار 2003